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Abstract 

Inference of evolutionary trees and rates from biological sequences is commonly 
performed using continuous-time Markov models of character change. The 
Markov process evolves along an unknown tree while observations arise only 
from the tips of the tree. Rate heterogeneity is present in most real data sets 
and is accounted for by the use of flexible mixture models where each site is 
allowed its own rate. Very little has been rigorously established concerning 
the idontifiability of the models currently in common use in data analysis, 
although non-identifiability was proven for a semi-parametric model and an 
incorrect proof of idcntifiability was published for a general parametric model 
(GTR+r+I). Here we prove that one of the most widely used models (GTR+F) 
is identifiable for generic parameters, and for all parameter choices in the case of 
4-state (DNA) models. This is the first proof of identifiability of a phylogenetic 
model with a continuous distribution of rates. 
Keywords: phylogenetics, identifiability 

AMS 2000 Subject Classification: Primary 60J25 

Secondary 92D15, 92D20 

1. Introduction 

A central goal of molecular phylogenetics is to infer evolutionary trees from 
DNA or protein sequences. Such sequence data come from extant species at the 
tips of the tree - the tree of life - while the topology of the tree relating these 
species is unknown. Inferring this tree helps us understand the evolutionary 
relationships between sequences. 

Phylogenetic data analysis is often performed using Markovian models of 
evolution: Mutations occur along the branches of the tree under a finite-state 
Markov process. There is ample evidence that some places in the genome 
undergo mutations at a high rate, while other loci evolve very slowly, perhaps 
due to some functional constraint. Such rate variation occurs at all spatial 
scales, across genes as well as across sites within genes. In performing inference, 
this heterogeneity is accounted for by the use of flexible mixture models where 
each site is allowed its own rate according to a rate distribution /x. In the 
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context of molecular phylogenetics, the use of a parametric family for ^ is 
generally considered both advantageous and sufficiently flexible. 

The question of identifiability for such a rate-variation model is a funda- 
mental one, as standard proofs of consistency of statistical inference methods 
begin by establishing identifiability. Without identifiability, inference of some 
or all model parameters may be unjustified. However, since phylogenetic data 
is gathered only from the tips of the tree, understanding when one has identifi- 
ability of the tree topology and other parameters for phylogenetic models poses 
substantial mathematical challenges. Indeed, it has been shown that the tree 
and model parameters are not identifiable if the distribution of rates is too 
general, even when the Markovian mutation model is quite simple |13| . 

The most commonly used phylogenetic model is a general time-reversible 
(GTR) Markovian mutation model along with a Gamma distribution family 
(r) for yU. For more flexibility, a class of invariable sites (I) can be added by 
allowing fi to be the mixture of a Gamma distribution with an atom at [1]. 
Numerous studies have shown that the addition to the GTR model of rate 
heterogeneity through F, I, or both, can considerably improve fit to data at 
the expense of only a few additional parameters. In fact, when model selection 
procedures are performed, the GTR+F+I model is preferred in most studies. 
These stochastic models are the basis of hundreds of publications every year in 
the biological sciences — over 40 in Systematic Biology alone in 2006. Their 
impact is immense in the fields of evolutionary biology, ecology, conservation 
biology, and biogeography, as well as in medicine, where, for example, they 
appear in the study of the evolution of infectious diseases such as HIV and 
influenza viruses. 

The main result claimed in the widely-cited paper [11] is the following: 

The 4:-base (DNA) GTR+T+I model, with unknown mixing parameter and F 
shape parameter, is identifiable from the joint distributions of pairs of taxa. 

However, the proof given in [llj of this statement is flawed; in fact, two gaps 
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occur in the argument. The first gap is in the use of an unjustified claim 
concerning graphs of the sort exemplified by Figure 3 of that paper. As this 
claim plays a crucial role in the entire argument, the statement above remains 
unproven. 

The second gap, though less sweeping in its impact, is still significant. As- 
suming the unjustified graphical claim mentioned above could be proved, the 
argument of pTJ still uses an assumption that the eigenvalues of the GTR rate 
matrix be distinct. While this is true for generic GTR parameters, there are 
exceptions, including the well-known Jukes-Cantor and Kimura 2-parameter 
models [1]. Without substantial additional arguments, the reasoning given in 
[llj cannot prove identifiability in all cases. 

Furthermore, bridging either of the gaps in [11] is not a trivial matter. 
Though we suspect that Rogers' statement of identifiability is correct, at least 
for generic parameters, we have not been able to establish it by his methods. 
For further exposition on the nature of the gaps, see the Appendix. 

In this paper, we consider only the GTR+F model, but for characters with 
any number k >2 states, where the case k = 4 corresponds to DNA sequences. 
Our main result is the following: 

Theorem 1. The K-state GTR+T model is identifiable from the joint distribu- 
tions of triples of taxa for generic parameters on any tree with 3 or more taxa. 
Moreover, when k = A the model is identifiable for all parameters. 

The term 'generic' here means for those GTR state distributions and rate 
matrices which do not satisfy at least one of a collection of equalities to be 
explicitly given in Theorem [2l Consequently, the set of non-generic parameters 
is of Lebesgue measure zero in the full parameter space. Our arguments are 
quite different from those attempted in [llj . We combine arguments from 
algebra, algebraic geometry and analysis. 

We believe this paper presents the first correct proof of identifiability for 
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any model with a continuous distribution fx of rates across sites that is not 
fully known. The non-identifiability of some models with more freely-varying 
rate distributions of rates across sites was established in [13]. That paper also 
showed identifiability of rate-across-sites models built upon certain group-based 
models provided the rate distribution fi is completely known. More recently, 
[1] proved that tree topologies are identifiable for generic parameters in rather 
general mixture models with a small number of classes. That result specializes 
to give the identifiability of trees for the K-state GTR models with at most 
K — 1 rates-across-sites classes, including the GTR+I model. Identifiability of 
numerical model parameters for GTR+I is further explored in [2]. There have 
also been a number of recent works dealing with non-identifiability of mixture 
models which are not of the rates-across-sites type; these include fl5 \ [T6t [9l [8]. 

In Section [2] we define the GTR+F model, introduce notation, and reduce 
Theorem[T]to the case of a 3-taxon tree. In Section [3l we use purely algebraic ar- 
guments to determine from a joint distribution certain useful quantities defined 
in terms of the model parameters. In Section [H in the generic case of certain 
algebraic expressions not vanishing, an analytic argument uses these quantities 
to identify the model parameters. Focusing on the important case of k = 4 
for the remainder of the paper, in Section [5] we completely characterize the 
exceptional cases of parameters not covered by our generic argument. Using 
this additional information, in Section [6] we establish identifiability for these 
cases as well. Finally, Section [7] briefly mentions several problems concerning 
identifiability of phylogenetic models that remain open. 

2. Preliminaries 

2.1. The GTR-|-rates-across-sites substitution model 

The K-state across-site rate-variation model is parameterized by: 

1. An unrooted topological tree T, with all internal vertices of valence > 3, 
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and with leaves labeled by ai, 02, . . . , a„. These labels represent taxa, and 
the tree their evolutionary relationships. 

2. A collection of edge lengths tg > 0, where e ranges over the edges of T. 
We require tg > for all internal edges of the tree, but allow tg > 
for pendant edges, provided no two taxa are total-edge-length-distance 
apart. Thus if an edge e is pendant, the label on its leaf may represent 
either an ancestral {tg = 0) or non-ancestral {tg > 0) taxon. 

3. A distribution vector tt = (tti, . . . , tt^) with tTj > 0, ^ tTj = 1, representing 
the frequencies of states occurring in biological sequences at all vertices of 
T. 

4. A K X K matrix Q = {qij), with q^j > ioi i ^ j and Yljlij — ^ 
each i, such that diag(7r)(5 is symmetric. Q represents the instantaneous 
substitution rates between states in a reversible Markov process. We will 
also assume some normalization of Q has been imposed, for instance that 
diag(7r)(5 has trace —1. 

Note that the symmetry and row summation conditions imply that tt 
is a left eigenvector of Q with eigenvalue 0, which in turn implies tt is 
stationary under the continuous-time process defined by Q. 

5. A distribution fj,, with non-negative support and expectation E(/i) = 
1, describing the distribution of rates among sites. If a site has rate 
parameter r, then its instantaneous substitution rates will be given by 
rQ. 

Letting [k] = {1, 2, . . . , k} denote the states, the joint distribution of states 
at the leaves of the tree T which arises from a rate-across-sites GTR model 
is computed as follows. For each rate r and edge e of the tree, let Mg^r = 
exp(terQ). Then with an arbitrary vertex p of T chosen as a root, let 

Priil,...,in) = I 7r(/lp) ]jMe,r(/ls(e),/i/(e)) 1 , (1) 

(/i„)eii" V e / 
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where the product is taken over all edges e of T directed away from p, edge e 
has initial vertex s{e) and final vertex /(e), and the sum is taken over the set 

H = Hi^i^,„i^^ = {(/i.t,)^6Vert(T) I G [k] \iv ^ aj, K = ij if v = aj} C [k]^'^''''^^ 

Thus H represents the set of all 'histories' consistent with the specified states 
ii, . . . ,in at the leaves, and the n-dimensional table Pr gives the joint distribu- 
tion of states at the leaves given a site has rate parameter r. Since the Markov 
process is reversible and stationary on tt, this distribution is independent of 
the choice of root p. 

Finally, the joint distribution for the GTR-|-/i model is given by the n- 
dimensional table 




The distribution for the GTR+F model is given by additionally specifying a 
parameter q > 0, with p then specialized to be the F-distribution with shape 
parameter a and mean 1, i.e., with scale parameter j3 = 1/a. 

2.2. Diagonalization of Q 

The reversibility assumptions on a GTR model imply that diag(7r^/^)Q diag(7r 
is symmetric, and that Q can be represented as 

Q = C/diag(0,A2,A3,...,A,)C/-\ 

where the eigenvalues of Q satisfy = Ai > A2 > A3 > • • • > [6j, and [/ is a 
real matrix of associated eigenvectors satisfying the equivalent statements 

C/f/^ = diag(7r)-\ C/^ diag(7r)C/ = /. (2) 

Furthermore, the first column of U may be taken to be the vector 1. 

While the Aj are uniquely determined by these considerations, in the case 
that all Aj are distinct the matrix U is determined only up to multiplication 
of its individual columns by ±1. If the Aj are not distinct, eigenspaces are 
uniquely determined but U is not. 
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Our method of determining Q from a joint distribution will proceed by deter- 
mining eigenspaces (via U) and the Aj separately. Although the non-uniqueness 
of U will not matter for our arguments, the normalization determined by 
equations ^ will be used to simplify our presentation. 

2.3. Moment generating function 

We also use the moment generating function (i.e., essentially the Laplace 
transform) of the density function for the distribution of rates in our model. 
As our algebraic arguments will apply to arbitrary rate distributions, while our 
analytic arguments are focused on T distributions, we introduce notation for 
the moment generating functions in both settings. 

Definition 1. For any fixed distribution ^ of rates r, let 

L{u) = L^{u) = E(e™) 

for — oo < n < 0, denote the expectation of e*"". In the special case of F- 
distributed rates, with parameters q > and /3 = 1/q, let 

L,(7x) = Lr,„ = E(e™) = (l--) . 

Note that L, and in particular L^, is an increasing function throughout its 
domain. 

2.4. Reduction to 3-taxon case 

To prove Theorem [H it is sufficient to consider only the case of 3-taxon trees. 

Lemma 1. // the statements of Theorem{l\ holds for 3-taxon trees, then they 
also hold for n-taxon trees when n > 3. 

Proof. As the generic condition of Theorem 1 is a condition on tt and Q (see 
Theorem [2] below for a precise statement), parameters on a n-taxon tree are 
generic if and only if the induced parameters on all induced 3-taxon trees are 
generic. 
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Figure 1: The unique 3-taxon tree relating taxa a, b, and c, with branch lengths ta, ti, and 

to. 

If the model on 3-taxon trees is identifiable for certain parameters, then from 
the joint distribution for a tree such as that of Figure [H we may determine 
a, Q, TT and the 3 edge lengths ta,th,tc- Thus we may determine the pairwise 
distances ta + tb, ta + tc, tb + tc between the taxa. From an n-taxon distribution, 
by considering marginalizations to 3 taxa we may thus determine a, Q, tv, and 
all pairwise distances between taxa. From all pairwise distances, we may recover 
the topological tree and all edge lengths by standard combinatorial arguments, 
as in [T^ . 

3. Algebraic arguments 

We now determine some information that we may obtain algebraically from 
a joint distribution known to have arisen from the GTR+/x model on a tree 
T relating 3 taxa. While in this paper we will only apply the results to the 
GTR+r model, we derive them at their natural level of generality. We therefore 
denote the moment generating function of the rate distribution by L, with its 
dependence on fj, left implicit. 

As marginalizations of the joint distribution correspond to the model on 
induced trees T' with fewer taxa, we work with trees with 1, 2, or 3 leaves. 

If T' has only 1 leaf, it is simply a single vertex, and the distribution of states 
is therefore tt. Thus tt is identifiable from a joint distribution for 1 or more 
taxa. 
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If T' has exactly 2 leaves, joined by an edge of length te > 0, then the joint 
distribution can be expressed as 

P = diag(7r)E(exp(terQ)) = diag(7r)C/diag(L(Aite), • • • , L{XJe))U-\ 

Therefore, diagonalizing diag(7r)~^P determines the collection of L(Xite) and 
the columns of U up to factors of ±1. Since L is increasing, we may determine 
individual L{Xite) by the requirement that 

1 = L(0) = L(Aite) > L{X2te) > ■ ■ ■> L{X^te). (3) 

When the Aj are distinct, this fixes an ordering to the columns of U . Regardless, 
we simply make a fixed choice of some U consistent with the inequalities ^ 
and satisfying equations ([2]). We can further require this choice of U be made 
consistently for all 2-taxon marginalizations of the joint distribution. Thus for 
any tree relating 2 or more taxa, we may determine the eigenspaces of Q via U 
and the value L{Xidjk) for each i and pair of taxa aj, ai^, where djk is the total 
edge- length distance between aj and a^. 

For T with exactly 3 leaves, let a, b, c be the taxa labeling them, with edge 
lengths as in Figure [H and let Xa,Xh,Xc denote the character states at these 
taxa. As in [3], denote by P"'''^ the square matrix containing the probabilities 

P^^^'^iJ) = =j,X, = j\Xa = i), 

which can be computed from the joint distribution. But 

p^f^'-t = E(e^*"'3 diag {e'^^^Q.^) e^*"'?) 
where e^*^'^.^ is the 7*^^ column of matrix e"^^"^ , so 

E(diag(e"*"^\ . . . , e''-^^)U-^ diag (e'^*^^.^) C/diag(e^*'^S . . . , e'^*'^")) . 
Note that the jth column of 

diag {e''^^.^) U 
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is the same as the 7th column of 

diag(C/.j)e'^*=^ . 

Thus when is fixed, the row vector formed by U~^P°'^'"'U for 7 = 
1, . . . , K is 

^ii]E(^ert„A.grt,A,grteQ^ (4) 

where fi^^ is the row vector with 

^i'^{k) = U-\i,k)U{Kj)=7T{h)U{k,i)U{k,j). (5) 

Finally, multiplying by C/ on the right, and setting v"^^ = fJ.^-'U, we see that 
the information brought by the triple of taxa {a, b, c} amounts to the knowledge 
of 

^,ij^f^^rt^\,^rt,\, diag(e'^*=^^ . . . ,e'^*=^'')) , 
i.e., to the knowledge of each 



E 



(^grt^A.grta.grt.A,^ = L{taXi + hXj + t^k) 



for which u^^{k) 7^ 0. 

This motivates the following notation, where for conciseness we let Uij = 
U{i,j): For i,j,k G [k], let 

^ijk = "^T^lUliUljUlk . 

I 

Note that while Vijk = iy^^{k), we prefer this new notation since the value of 
i/jjfc is unchanged by permuting subscripts: 

^ijk — ^ikj — ^jik — ^jki — ^kij — ^kji- 

Furthermore, since tt can be determined from 1-taxon marginalizations, and U 
from 2-taxon marginalizations, from a 3-taxon distribution we may compute 
Uijk for all i,j,k. 

In summary, we have shown the following: 
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Proposition 1. From a distribution arising from the GTR+fx model on the 
3-taxon tree of Figure [T], we may obtain the following information: 

1. TV, from 1-marginalizations 

2. all matrices U which diagonalize Q as above, and for all i the values 

L{Xi{ta + tb)), L{Xi{ta + tc)), L{Xi{tb + tc)), 

from 2-marginalizations, and 

3. the values L(Xita + Xjti, + X^tc) for all i,j, k such that uijk 7^ for some 
such choice of U . 

Note that ([2]) can be obtained as a special case of ([3|) by taking j = i, k = 1, 
as it is easy to see i^m / 0. We shall also see that fj^i = if i 7^ j, so certainly 
some of the Uijk can vanish. 

One might expect that for most choices of GTR parameters all the I'ijk 7^ 
for i,j,k > 1. Indeed, this is generally the case, but for certain choices one 
or more of these I'ijk can vanish. The Jukes-Cantor and Kimura 2- and 3- 
parameter models provide simple examples of this for k;=4: For these models, 
one may choose 

/i 1 1 A 

1-1 1-1 
1 1-1-1 
yi -1 -1 ij 

and Uijk 7^ for i,j,k > 1 only when i,j,k are distinct. While for the Jukes- 
Cantor and Kimura 2-parameter models one may make other choices for U, one 
can show that these alternative choices of U do not lead to the recovery of any 
additional information. 

Nonetheless, for k > 3 there is always some genuine 3-taxon information 
available from a distribution, as we now show. Although we do not need the 
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following proposition for the proof of Theorem [H the method of argument it 
introduces underlies Section [5] below. 

Proposition 2. With k >3, for any choice of GTR parameters there exists at 
least one triple i,j, k > 1 with vijk 7^ 0. 

Proof. Suppose for all triples i,j,k > 1, 

^ijk = T^lUliUljUlk = 0. (6) 

Prom equation ^ we also know that if j 7^ k, then 

^ijfc = X] '^i^ii^ik = 0. (7) 
I 

Both of these equations can be expressed more conveniently by introducing 
the inner product 

(x,y) = x^diag(7r)y. 

Then with Ui being the ith. column of C/, and Wjk being the vector whose /th 
entry is the product UijUik, equations ([6]) give the orthogonality statements 

{Ui,Wjk) = 0, iii,j,k > 1, 
while equations ([7]) yield both 

{Ui,Wjk) = 0, ifj^k, and 
{Uj,Uk) = 0, ifj^k. 

In particular, we see for j, k > 1, j ^ k, that Wjk is orthogonal to all Ui, and 
so Wjk = 0. Considering individual entries of Wjk gives that, for every /, 

UijUik = 0, for all j,k>l,j^k. (8) 

Now note that for any j > 1, the vector Uj must have at least 2 non-zero 
entries. (This is simply because Uj is a non-zero vector, and (1, Uj) = since 
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Ui = 1.) We use this observation, together with equation ([8]), to arrive at a 
contradiction. 

First, without loss of generahty, assume the first two entries of U2 are non- 
zero. Then by equation ^ the first two entries of all the vectors C/3, C/4, . . . 
must be 0. But then we may assume the third and fourth entries of C/3 are 
non-zero, and so the first 4 entries of C/4, . . . are zero. For the 4-state DNA 
model, this shows C/4 = 0, which is impossible. 

More generally, for a K-state model, we find C/fc = as soon as 2{k — 2)>k. 
Note that for k > 4 this happens for some value of k < k, thus contradicting 
that the Uk are non-zero. In the k = 3 case the same argument gives that C/3 
has only one non-zero entry, which is still a contradiction, since C/3 is orthogonal 
to C/i = 1. Thus the lemma is established for a K-state model with k > 3. 

For K = 2, the statement of Proposition [2] does not hold, as is shown by 
considering the 2-state symmetric model, with 



However, one can show this is the only choice of n and U for which 1^222 = 0. 



We now complete the proof of the first statement in Theorem [H the identifia- 
bility of the GTR+F model for generic parameters, which is valid for all values 
of K > 2. As we now consider only F-distributed rates, we use the specialized 
moment generating function in our arguments. 

More precisely, we will establish the following: 

Theorem 2. For k>2, consider those GTR parameters for which there exist 
some with 1 < i < j , such that Vijj 7^ 0. Then restricted to these 
parameters, the GTR+T model is identifiable on 3-taxon trees. 



TV = 



(1/2, 1/2), and U 




4. Identifiability for generic parameters 
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Remark 1. Note that the conditions Vijj = are polynomial in the entries of 
U and TT. Viewing the GTR model as parameterized by those variables together 
with the Aj, then the set of points in parameter space for which Uijj = for 
some i,j with 1 < i < j forms a proper algebraic variety. Basic facts of 
algebraic geometry then implies this set is of strictly lower dimension than the 
full parameter space. A generic point in parameter space therefore lies off this 
exceptional variety, and the exceptional points have Lebesgue measure zero in 
the full parameter space. 

Remark 2. For k = 2, identifiability does not hold for the 3-taxon tree if 
the generic condition that Uijj ^ for some 1 < i < j is dropped. Indeed, 
if 1^222 = 0, then, as commented in the last section, tt and U arise from the 
2-state symmetric model. Since there are only two eigenvalues of Q, Xi = 
and A2 < 0, the second of these is determined by the normalization of Q. As 
the proof of Proposition [T] indicates, the only additional information we may 
obtain from the joint distribution is the three quantities 

LaiHta + tb)), La{X2{ta + tc)), (A2 (4 + ^c)) • 

Since these depend on four unknown parameters a, ta,th, tc, it is straightforward 
to see the parameter values are not uniquely determined. 

Our proof of Theorem [2] will depend on the following technical lemma. 
Lemma 2. Suppose c> a > di > and c > 6 > ^2 > 0. Then the equation 

d-P + d-^P _ a-" - h-l" - c-l" + 1 = d. 
has at most one solution with /? > 0. 

Proof. The equation can be rewritten as 

Now a function g{(3) = — s^ is strictly convex on /3 > provided r > s > 1, 
since g"{(3) > 0. If r = s, then g{l3) = is still convex. Thus when viewed as 
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a function of (3 the first expression on the left side of equation ([9]) is convex, 
and the second expression is strictly convex. Also, for any r > the function 
h{(5) = — 1 is convex, so the third expression in equation ([9]) is convex as well. 
Thus the sum of these three terms, the left side of equation ([9l), is a strictly 
convex function of (3. 

But a strictly convex function of one variable can have at most two zeros. 
Since the function defined by the left side of equation Q has one zero at /? = 0, 
it therefore can have at most one zero with /? > 0. 

Proof of Theorem\^ For some j > « > 1, we are given that Vijj 7^ 0. As 
Vijj = Vjij, by Proposition [1] we may determine the values 

as well as 

Cfc = La{Xk{ta + tb)), 
Bk = La{\k{ta + tc)), 
^k = La{Xk{tb + tc)) 

for A; = 1, . . . , K. 

Since La is increasing, for any /c > 1 we can use the values of Ck,Bk to 
determine which of tb and tc is larger. Proceeding similarly, we may determine 
the relative ranking of ta, tf,, and tc- Without loss of generality, we therefore 
assume 

0<ta<th<tc 

for the remainder of this proof. Note however that if ta = 0, then tb > 0, by 
our assumption on model parameters that no two taxa be total-edge-length- 
distance apart. 
Observe that 
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or, using the formula for and letting /3 = 1/a, 

+ ^rS - A/ -B-^-C-^ + 1 = 0. (10) 

Since j > i > 1, we have that Xj < Xi < 0. Because is an increasing 
function, and < ta < t;, < tc, with > 0, this implies 

Ci > Aj > Dijj, and 
Cj > Bj > Djij- 

Thus applying Lemma [5] to equation (fTUD , with 

a = Aj, b = Bj, c = Ci, di = Dijj, 6,2 = Djij, 

we find (3 is uniquely determined, so a = is identifiable. 
Once a is known, for every k we may determine the quantities 

Xk{ta + h) = La'^{Ck), 
Xk{ta + tc) = L~^{Bk), 

Xk{tb + tc) = L-\Ak). 

Thus we may determine the ratio between any two eigenvalues A^. As U is 
known, this determines Q up to scaling. Since we have required a normalization 
of Q, this means Q is identifiable. With the now determined, we can find 
ta + tb, ta + tc and tb + tc, and hence ta, tb, tc- 

5. Exceptional cases {k = 4) 

In the previous section, identifiability was proved under the assumption that 
i^ijj 7^ for some J > i > 1. We now specialize to the case of k = 4, and 
determine those GTR parameters for which none of these conditions holds. In 
the subsequent section, we will use this information to argue that even in these 
exceptional cases the GTR+F model is identifiable. 
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Note that while we work only with a 4-state model appropriate to DNA, 
the approach we use may well apply for larger k, though one should expect 
additional exceptional subcases to appear. 



Lemma 3. For k = A, consider a choice of GTR parameters for which v., 







for all j > i > 1. Then, up to permutation of the states and multiplication of 
some columns of U by —1, the distribution vector tt and eigenvector matrix U 
satisfy one of the two following sets of conditions: 

Case A: TV = (1/4, 1/4, 1/4, 1/4), and for some 6, c > with b"^ + c^ = 2, 
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Proof. We use the notation of Proposition [51 including the inner product 
and definition of vectors Wij given in its proof. Orthogonality and lengths will 
always be with respect to that inner product. 

We will repeatedly use that for i,j with 1 < i < j, 



{Wjj,Ui) = Uijj = 0. 



In particular, setting j = 4, we find is orthogonal to U2,Us,Ui, and 
hence is a multiple of f/i = 1. This implies 



Ui = (±1,±1,±1,±1), 
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since U4 has length 1. Without loss of generality, by possibly permuting the 
rows of U (which is equivalent to changing the ordering of the states in writing 
down the rate matrix Q), and then possibly multiplying by —1, we need 
now only consider two cases: either 

Case A: C/4 = (1, 1, -1, -1), or 
CaseB: C/4 = (1, 1, 1, -1). 

We consider these two cases separately. 

Case A: Since Ui = 1 and C/4 = (1, 1, —1, —1), the orthogonality of Ui and C/4 
gives 

TTl + 7r2 — 7r3 — 7r4 = 0. 

Since Yli=i = 1) this tells us 

TTi + vrs = 1/2, vra + 7r4 = 1/2. (11) 

Now since VF33 is orthogonal to both U2 and C/3, then W33 is a linear 
combination of C/i and C/4, and hence W33 = (6^, 6^, c^, c^). Thus 

C/3 = {±b, ±b, ±c,±c). 

Since C/3 is orthogonal to both Ui and C/4, it is orthogonal to their linear com- 
binations, and in particular to (1,1,0,0) and (0,0,1,1). Thus, by permuting 
the first two entries of the Ui, and also permuting the last two entries of the 
C/i, if necessary, we may assume 

U3 = (6, -6, c, -c) 

with 6, c > 0. This orthogonality further shows 

fcvTl — 6712 = 0, CTTs — C-K4 = 0. 



Thus 



TTi = 7r2, or 6 = 0, 
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and 

vr3 = TTi, or c = 0. 
In light of equations (jlip . we have 

TTi = 7r2 = 1/4, or 6 = 0, 

and 

TTa = 7r4 = 1/4, or c = 0. 
In any of these cases, U3 has length 1 so 

6^(7ri + 7T2) + C^(vr3 + TT^) = 1. 

Together with equations (fTT|) this gives that 

6^ + c2 = 2. 

Now since C/2 is orthogonal to f/i, C/3, C/4, we must have that 

U2 = a(c/7ri, -c/7r2, -ft/vra, 6/714) 

for some a, and we may assume a > 0. But the length of U2 is 1, and U2 is 
orthogonal to 1^22) so 

cV^i + cV^2 + &V^3 + &V7r4 = l/a^ (12) 
c'/ttI - cVvri - 6Vvri + 6V7r| = 0. (13) 

If neither of 6, c is zero, so all vTj = 1/4, then equation ([12]) tells us a = 1/4, 
as the statement of the theorem claims. 

If 6 = 0, then we already know c = \/2, and vrs = 7r4 = 1/4. But equation 
(|13p implies vri = vr2, so these are also 1/4. We then find from equation (|12p 
that o = 1/4, and we have another instance of the claimed characterization of 
case A. Similarly, if c = we obtain the remaining instance. 



Identifiability of a model of molecular evolution 21 

Case B: Since Ui = 1 and U4 = (1, 1, 1, —1), the orthogonality of Ui and U4 
imphes 

TTl + 7r2 + TTa — 7r4 = 0. 

Now Ws3 is orthogonal to U2 and U3, and hence is a linear combination of 
Ui and U4. Thus W33 = {b^ , b'^ , b'^ , c^) , so 

U3 = {±b,±b,±b,c). 

But U3 is orthogonal to both Ui and U4, and hence orthogonal to their linear 
combinations, including (0,0,0,1) and (1,1,1,0). This shows c = and that 
(possibly by permuting the first three rows of U, and multiplying U3 by —1) 
we may assume U3 = b{l, 1,-1,0) for some 6 > 0. Orthogonality of U3 and Ui 
then shows 

TTl + 7r2 — 7r3 = 0. 

Also W22 is orthogonal to U2, and hence is a linear combination of 1/1,1/3,1/4, 
so W22 = {(P,(f,e^,f). Thus 

U2 = {±d,±d,e,f). 

However, since U2 is orthogonal to Ui, 1/3,1/4, it is orthogonal to (0,0,0,1), 
(0, 0, 1,0), and (1, 1, 0, 0). Thus we may assume U2 = d{l, —1, 0, 0) with d > 0. 
Finally, orthogonality of U2 and Ui implies 

TTl — 7r2 = 0. 

All the above equations relating the tTj, together with the fact that Yli=i '^i = 
1 gives 

7r = 7ri(l, 1, 2, 4) = (1/8, 1/8, 1/4, 1/2). 

We can now determine the Ui exactly, using that they must have length 1, 
to show [/ is as claimed. 
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6. Identifiability in the exceptional cases {k — 4) 

We now complete the proof of Theorem [1] by showing identifiabihty in cases 
A and B of Lemma [3l We do this by first estabhshing some inequahties for the 
eigenvalues of Q that must hold in each of these cases, using the assumption 
that the off-diagonal entries of Q are positive. 

Note that as U^^ = ^7"^ diag(7r), and the entries of tt are positive, the 
positivity of the off-diagonal entries of Q is equivalent to the positivity of the 
off-diagonal entries of the symmetric matrix 

Q = [/diag(0,A2,A3,A4)C/'^. 

Lemma 4. For k = 4, Zet = Ai > A2 > A3 > A4 denote the eigenvalues of a 
GTR rate matrix Q. Then the following additional inequalities hold in cases A 
and B of Lemma 0' 

Case A: If he / 0, then A4 > A2 + A3, while if he = 0, then A4 > 2A2. 

Case B: A4 > 2A2. 

Proof. For case A, one computes that 

^* -A2C^ - As^^ -I- A4 -A26C + A36C - A4 A26C - A36C - A4 ^ 



Q 



* * X2hc — X^hc — A4 — A26C -|- A36C — A4 

* * * —X2h^ — Asc^ -I- A4 
* * * / 

where the stars indicate quantities not of interest. From the positivity of the 
(1,2) and (3,4) entries of Q, we thus know 

\ ^ a 2 ^ , ,2 , ,2 ^ , 2^ ^ (A2C2 + A36') + (A2b' + A3C^) 

A4 > max(A2C + Xsh , A20 + A3C ) > . 

Since 6^ + = 2, this shows A4 > A2 -|- A3. In the case when he = 0, so 
(6, c) = (0, \/2) or (\/2, 0), the first inequality gives the stronger statement of 
the proposition. 
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For case B, 



Q 



* -4A2 + 2A3 + A4 -2A3 + A4 -A4 

* * ~2A3 + A4 — A4 
^ ^ — A4 



Prom the positivity of the off-diagonal entries, we see that 



A4 > 2A3, A4 + 2A3>4A2. 



Together, these imply that A4 > 2A2. 



We now return to proving identifiability for the exceptional cases. As in the 
proof of Theorem [51 we may determine the relative rankings of ta, H and tc, 
and therefore assume 

with tf) > 0. 

In case A, we find that 1/234 = be, so we break that case into two subcases, 

Case Al: if 6, c 7^ 0; and 
Case A2: if 6 or c = 0. 

Case Al: In this case, we find that Uijk 7^ for all distinct k > 1. Letting 

-D342 = LaiX^ta + X^tb + A2tc), 
-D423 = La{Xita + X2tb + X^tc)- 

and ^fc, -Bfc> Cfc be as in the proof of Theorem [21 observe that 

^^'(1^342) + L-\Di23) = L-\A2) + L-\B^) + L-\Ci). 
Setting /3 = 1/a and using the explicit formula for yields 

^3"42 + ^4"23 - ^2"^ - - + 1 = 0. (14) 



24 AUman, Ane, Rhodes 

Note that by Proposition [1] all constants in this equation, except possibly /?, 
are uniquely determined by the joint distribution. 

In preparation for applying Lemma [21 we claim that the following inequalities 
hold: 

I?342 < ^2, (15) 
1^423 < ^3, (16) 
D3i2 < Ci, (17) 
Z)423 < (18) 

Inequalities ()15|16p follow easily from the fact that is increasing. For 
inequality (fT7|) . note first that X-^ta + X^th + X2tc < (A2 + X3)ta + A44. But 
Lemma m indicates A2 + A3 < A4, so, again using that La is increasing, the 
claim follows. Inequality (|18|) is similarly shown to hold. 

Finally, to apply Lemma [21 let di = D342 , (^2 = ^423 • The remainder of the 
constants in the lemma are chosen in one of three ways, depending on which of 
^2,-83,^4 is largest: 

If C4 > A2, B3, then let a = A2, b = B^, c = C4. 

If A2 > C4, B3, then let a = 64^, b = B3, c = A2. 

If ^3 > C4, A2, then let a = yl2, 6 = C4, c = S3. 
Thus in all subcases, from equation (fH|) we find that /? > is uniquely 
determined. 

The remainder of the proof now proceeds exactly as for Theorem [2l 

Cases A2 and B: In both of these cases 1^224 7^ 0, so, similarly to the previous 
case, letting 

-D422 = ^cxiX^^a + A2t;, + A2tc), 
-D242 = La{X2ta + X^th + X2tc), 



leads to 



^4"22 + ^2"42 - -A^"-B^^ + l = 0. (19) 
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By Proposition [H we know all quantities in this equation except possibly /? are 
uniquely determined from the joint distribution. 
We also note the following inequalities hold: 

A2 < B2, (20) 

D422 < A2, (21) 

D242 < B2, (22) 

D242 < C4. (23) 

Inequalities (j2UH22|) are implied by the fact the is increasing. Inequality 
([23|) will follow from A2(ia + tc) < X^ta- However, X2{ta + ic) < 2A2ta < ^ita 
by Lemma m 

To apply Lemma [21 let di = D422 and d2 = -D242 • In light of inequality (j20p , 
we need assign the remaining constants according to only two cases: 

If C4 > B2, let a = A2,b = B2, and c = C4. 

If B2 > C4, let a = A2,b = C4, and c = B2. 
In both cases, we find (3 is uniquely determined, and the the proof of identifia- 
bility can be completed as in Theorem [5J 

Thus identifiability of the GTR+L model when k = 4 is established for all 
cases. 



7. Open problems 

Many questions remain on the identifiability of phylogenetic models, includ- 
ing those commonly used for data analysis. 

Perhaps the most immediate one is the identifiability of the GTR+F+I 
model. Despite its widespread use in inference, no proof has appeared that the 
tree topology is identifiable for this model, much less its numerical parameters. 
Although our algebraic arguments of Section [3] apply, analogs for GTR+L+I of 
the analytic arguments we gave for GTR+F are not obvious. While the T rate 
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distribution has only one unknown parameter, F+I has two, and this increase in 
dimensionahty seems to be at the heart of the difficulty. Interestingly, empirical 
studies [H] have also shown that these parameters can be difficult to tease apart, 
as errors in their inferred values can be highly correlated in some circumstances. 
Although we conjecture that GTR+F+I is identifiable for generic parameters, 
we make no guess as to its identifiability for all parameters. 

For computational reasons, standard software packages for phylogenetic in- 
ference implement a discretized T distribution [T7|, rather than the continuous 
one dealt with in this paper. While results on continuous distributions are 
suggestive of what might hold in the discrete case, they offer no guarantee. 
It would therefore also be highly desirable to have proofs of the identifiability 
of the discretized variants of GTR+F and GTR+F+I, either for generic or all 
parameters. Note that such results might depend on the number of discrete 
rate classes used, as well as on other details of the discretization process. So 
far the only result in this direction is that of [Ij on the identifiability of the tree 
parameter, for generic numerical parameter choices when the number of rate 
classes is less than the number of observable character states {e.g., at most 3 
rate classes for 4-state nucleotide models, or at most 60 rate classes for 61-state 
codon models). As the arguments in that work use no special features of a 
F distribution, or even of an across-site rate variation model, we suspect that 
stronger claims should hold when specializing to a particular form of a discrete 
rate distribution. 

Finally, we mention that beyond [T], almost nothing is known on identifia- 
bility of models with other types of heterogeneity, such as covarion-like models 
and general mixtures. As these are of growing interest for addressing biological 
questions [5l[10l[7], much remains to be understood. 
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Appendix A. The gaps in Rogers' proof 

Here we explain the gaps in the pubhshed proof of Rogers [11] that the 
GTR+r+I model is identifiable. Since that paper has been widely cited and 
accepted as correct, our goal is to clearly indicate where the argument is flawed, 
and illustrate, through some examples, the nature of the logical gaps. 

We emphasize that we do not prove that the gaps in the published argument 
cannot be bridged. Indeed, it seems most likely that the GTR+F+I model is 
identifiable, at least for generic parameters, and it is possible a correct proof 
might follow the rough outline of [llj. However, we have not been able to 
complete the argument Rogers attempts. Our own proof of the identifiability 
of the GTR+r model presented in the body of this paper follows a different 
line of argument. 

We assume the reader of this appendix will consult [11], as pinpointing the 
flaws in that paper requires rather technical attention to the details in it. 

A.l. Gaps in the published proof 

There are two gaps in Rogers' argument which we have identified. In this 
section we indicate the locations and nature of these flaws, and in subsequent 
ones we elaborate on them individually. 

The first gap in the argument occurs roughly at the break from page 717 to 
page 718 of the article. To explain the gap, we first outline Rogers' work leading 
up to it. Before this point, properties of the graph of the function u^^{fj,{x)) 
have been carefully derived. An example of such a graph, for particular values 
of the parameters a, a, 7r,p occurring in the definitions of u and /i, is shown in 
Figure 2 of the paper. For these parameter values and others, the article has 
carefully and correctly shown that for x > the graph of v^^{fi{x)) 

1. is increasing, 

2. has a single inflection point, where the graph changes from convex to 
concave (i.e, the concavity changes from upward to downward). 
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3. has a horizontal asymptote as x — > oo. 

Although the article outlines other cases for different ranges of the parameter 
values, Rogers highlights the case when these three properties hold. 

At the top of page 718 of the article, Figure 3 is presented, plotting the 
points whose coordinates are given by the pairs (z^~^(^(riAj)), z^^^(^(r2Aj))) 
for all Xi > 0. Here T2 > ti are particular values, while a,a,7r,p are given the 
values leading to Figure 2. Rogers points out that "As in Figure 2, the graph 
[of Figure 3] has an inflection point, is concave upwards before the inflection 
point, and is concave downwards after the inflection point." Then he claims 
that "Similar graphs will be produced for any pair of path distances such that 
T2 > Ti" However, he gives no argument for this claim. As the remainder of the 
argument strongly uses the concavity properties of the graph of his Figure 3 (in 
the second column on page 718 the phrase ". . . as shown by Figure 3" appears), 
without a proof of this claim the main result of the paper is left unproved. 

Judging from the context in which it is placed, a more complete statement 
of the unproved claim would be that for any values of a,a,7r,p resulting in a 
graph of v^^i^fi^x)) with the geometric properties of Figure 2, and any T2 > ti, 
the graph analogous to Figure 3 has a single inflection point. As no argument 
is given to establish the claim, we can only guess what the author intended 
for its justification. From what appears earlier in the paper, it seems likely 
that the author believed the three geometric properties of the graph in Figure 
2 enumerated above implied the claimed properties of Figure 3. However, that 
is definitely not the case, as we will show in Section [A. 21 below. 

Note that we do not assert that the graphs analogous to Figure 3 for various 
parameter values are not as described in [11]. While plots of them for many 
choices of parameter values certainly suggest that Rogers' claim holds, it is of 
course invalid to claim a proof from examples. Moreover, with 4 parameters 
a,a,7r,p to vary, it is not clear how confident one should be of even having 
explored the parameter space well enough to make a solid conjecture. In light 
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of the example we give in Section [A. 21 justifying Rogers' claim would require a 
much more detailed analysis of the functions and /x than Rogers attempts. 

If this first gap in the proof were filled, a second problem would remain. 
Though less fundamental to the overall argument, this gap would mean that 
identifiability of the model would be established for generic parameters, but 
that there might be exceptional choices of parameters for which identifiability 
failed. ('Generic' here can be taken to mean for all parameters except those 
lying in a set of Lebesgue measure zero in parameter space. More informally, 
for any reasonable probability distribution placed on the parameter space, 
randomly-chosen parameters will be generic.) 

Although the origin of this problem with non-generic parameters is clearly 
pointed out by Rogers, it is open to interpretation whether he attempts to 
extend the proof to all parameter values at the very end of the article. However, 
as the abstract and introductory material of [11] make no mention of the issue, 
this point at the very least seems to have escaped many readers attention. 

This gap occurs because the published argument requires that the non-zero 
eigenvalues of the GTR rate matrix Q be three distinct numbers. On page 
718, at the conclusion of the main argument, it is stated that "Therefore, if the 
substitution rate matrix has three distinct eigenvalues, the parameters of the 
I+r rate heterogeneity will be uniquely determined." The author then goes on 
to point out that for the Jukes-Cantor and Kimura 2-parameter models this 
assumption on eigenvalues is violated, but "[f]or real data sets, however, it is 
unlikely that any two or all three of the eigenvalues will be exactly identical." 

Leaving aside the question of what parameters one might have for a model 
which fits a real data set well, Rogers here clearly indicates that his proof of 
identifiability up to this point omits some exceptional cases. In the concluding 
lines of the paper, he points out that these exceptional cases can be approxi- 
mated arbitrarily closely by parameters with three distinct eigenvalues. While 
this is true, such an observation cannot be used to argue that the exceptional 
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cases are not exceptional, as we will discuss below in Section [A.3[ It is unclear 
whether the concluding lines of [llj were meant to 'fill the gap' or not. 

Of course, one might not be too concerned about exceptional cases. Indeed, 
if the first flaw were not present in his argument, then Rogers' proof would 
still be a valuable contribution in showing that for 'most' parameter values 
identifiability held. One might then look for other arguments to show identifi- 
ability also held in the exceptional cases. Nonetheless, it is disappointing that 
the exceptional cases include models such as the Jukes-Cantor and Kimura 2- 
parameter that are well-known to biologists and might be considered at least 
reasonable approximations of reality in some circumstances. 

A. 2. A counterexample to the graphical argument 

It seems that the origin of the first flaw in Rogers' argument is in a belief 
that the three enumerated properties he proves are exhibited in his Figure 2 
result in the claimed properties of his Figure 3. In this section, we show this 
implication is not valid, by exhibiting a function whose graph has the three 
properties, but when the graph analogous to Figure 3 is constructed, it has 
multiple inflection points. 



so f'{x) > and / is increasing. Furthermore, one sees that /'(x) decays 
quickly enough to as x ^ oo, so that f{x) has a horizontal asymptote as 
x — > oo. 

To see that f{x) has a single inflection point where the graph passes from 
convex to concave, it is enough to show f'{x) has a unique local maximum and 
no local minima. But this would follow from g{x) = ln(/'(x)) having a unique 



Let 




Then /(O) = 0, and 
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local maximum and no local minima. Since 

g{x) = exp (-10(x - l)^) - 

and the two summands here have unique local maxima at a; = 1 and no local 
minima, g must as well. Thus / exhibits the enumerated properties of Rogers' 
Figure 2. For comparison, we graph / in our Figure [2] below. 



Figure 2: The graph y = f{x). 

The analog of Figure 3 for the function / would show the points {/{tix), f{T2x)). 
If we choose ri = l,r2 = 2, we obtain the graph shown in our Figure [3l 
Obviously, the curve in Figure [3] has multiple — at least three — inflection 



Figure 3: The points {f{x),f{2x)). 

points. Although we will not give a formal proof here that this curve has 
multiple inflection points, it is not difficult to do so. 

A. 3. Identifiability for generic parameters vs. all parameters 

The second gap in Rogers' argument arises because it is possible to have 
identifiability for generic parameters, but not for all parameters. Even if 
identifiability of generic parameters has been proved, then one cannot easily 
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argue that identifiability must hold for the non-generic, exceptional cases as 
well. To illustrate this, we give a simple example. 
Consider the map cj) -.M? — > M^, defined by 

(p{a, b) = (a, ab) . 

Here o, b play the roles of 'parameters' for a hypothetical model, whose 'joint 
distribution' is given by the vector-valued function (p. 

Suppose {x,y) is a particular distribution which arises from the model (i.e., 
is in the image of (p), and we wish to find a, b such that (f){a, b) = {x, y). Then 
provided x 7^ (or equivalently a 7^ 0), it is straightforward to see that a,b 
must be given by the formulas 

a = X, b = y/x. 

Thus for generic a, b (more specifically, for all (a, b) with a 7^ 0) this hypothetical 
model is identifiable. 

Notice, however, that if {x,y) = (0,0), the situation is quite different. From 
X = 0, we see that we must have a = 0. But since (/>(0, b) = (0, 0), we find that 
all parameters of the form (0, b) lead to the same distribution (0, 0). Thus these 
exceptional parameters are not identifiable. Therefore, we have identifiability 
precisely for all parameters in the 2-dimensional a6-plane except those lying on 
the 1-dimensional line where a = 0. These exceptional parameters, forming a 
set of lower dimension than the full space, have Lebesgue measure zero within 
it. 

Notice that even though there are parameter values arbitrarily close to the 
exceptional ones (0, b) which are identifiable (for instance, (e, b) for any small 
e 7^ 0), it is invalid to argue that the parameters (0, b) must be identifiable as 
well. 

This example shows that even if the first flaw in the argument of were 
repaired, the approach outlined there will at best give identifiability for generic 



Identifiability of a model of molecular evolution 



35 



parameters. The final lines of that paper are not sufficient to prove identifia- 
bility for all parameter values. 

Obviously the function cj) given here could not really be a joint distribution for 
a statistical model, since the entries of the vector (p{a, b) do not add to one, nor 
are they necessarily non-ncgativc. However, these features can be easily worked 
into a more complicated example. If one prefers a less contrived example, then 
instances of generic identifiability of parameters but not full identifiability occur 
in standard statistical models used outside of phylogcnetics (for instance, in 
latent class models). We have chosen to give this simpler example to highlight 
the essential problem most clearly. 



